ECAGP Air Quality Analysis

1 Overview

This code gives an overview on: - Loading air quality data from QuantAQ instruments - Performing initial summary analysis - Loading meteorology data from public data - Combining air quality and meteorology data - Conducting exploratory analysis on dynamics of air pollutants related to time and meteorology

2 AIR QUALITY DATA LOADING

2.1 Loading initial packages

2.2 Load Air Quality Data file.

2.3 Summary statistics

##       pm1               pm25              pm10         
##  Min.   :  0.054   Min.   :  0.221   Min.   :   0.221  
##  1st Qu.:  3.156   1st Qu.:  5.319   1st Qu.:  15.425  
##  Median :  5.010   Median :  7.705   Median :  26.910  
##  Mean   :  6.156   Mean   :  8.792   Mean   :  41.722  
##  3rd Qu.:  8.046   3rd Qu.: 11.129   3rd Qu.:  46.568  
##  Max.   :270.678   Max.   :351.924   Max.   :8277.649  
##  NA's   :35        NA's   :35        NA's   :35        
##  timestamp_local.x                    sn.x          
##  Min.   :2024-07-01 20:26:12.00   Length:508006     
##  1st Qu.:2024-08-17 15:36:42.75   Class :character  
##  Median :2024-10-01 21:50:38.00   Mode  :character  
##  Mean   :2024-10-07 11:58:00.75                     
##  3rd Qu.:2024-11-23 08:20:55.50                     
##  Max.   :2025-02-05 23:59:32.00                     
##                                                     
##   timestamp.x                        met.xrh        met.xtemp    
##  Min.   :2024-07-01 20:26:12.00   Min.   : 9.56   Min.   :-3.14  
##  1st Qu.:2024-08-17 19:36:42.75   1st Qu.:46.21   1st Qu.:22.84  
##  Median :2024-10-02 01:50:38.00   Median :63.25   Median :28.34  
##  Mean   :2024-10-07 15:42:28.66   Mean   :60.34   Mean   :27.21  
##  3rd Qu.:2024-11-23 13:20:55.50   3rd Qu.:75.11   3rd Qu.:32.22  
##  Max.   :2025-02-06 04:59:32.00   Max.   :99.04   Max.   :46.41  
##                                                                  
##      pm1num             sn                 lat             lon        
##  Min.   :  0.000   Length:508006      Min.   :29.73   Min.   :-95.24  
##  1st Qu.:  7.993   Class :character   1st Qu.:29.73   1st Qu.:-95.24  
##  Median : 13.094   Mode  :character   Median :29.73   Median :-95.24  
##  Mean   : 16.852                      Mean   :29.73   Mean   :-95.24  
##  3rd Qu.: 21.116                      3rd Qu.:29.73   3rd Qu.:-95.24  
##  Max.   :256.040                      Max.   :29.73   Max.   :-95.24  
##                                                                       
##    sitename         mod_date_1min                    original_met_time 
##  Length:508006      Min.   :2024-07-01 20:26:00.00   Length:508006     
##  Class :character   1st Qu.:2024-08-17 15:37:00.00   Class :character  
##  Mode  :character   Median :2024-10-01 21:51:00.00   Mode  :character  
##                     Mean   :2024-10-07 12:18:11.82                     
##                     3rd Qu.:2024-11-23 08:21:00.00                     
##                     Max.   :2025-02-06 00:00:00.00                     
##                                                                        
##       tmpc             wd            ws         timestamp_local.y 
##  Min.   :-6.67   Min.   :  0   Min.   : 0.000   Length:508006     
##  1st Qu.:19.44   1st Qu.: 40   1st Qu.: 2.056   Class :character  
##  Median :25.56   Median :120   Median : 3.084   Mode  :character  
##  Mean   :23.81   Mean   :129   Mean   : 3.148                     
##  3rd Qu.:28.89   3rd Qu.:180   3rd Qu.: 4.112                     
##  Max.   :38.89   Max.   :360   Max.   :26.213                     
##  NA's   :13      NA's   :13    NA's   :13                         
##       date                            name          
##  Min.   :2024-07-01 20:26:12.00   Length:508006     
##  1st Qu.:2024-08-17 15:36:42.75   Class :character  
##  Median :2024-10-01 21:50:38.00   Mode  :character  
##  Mean   :2024-10-07 12:18:05.05                     
##  3rd Qu.:2024-11-23 08:20:55.50                     
##  Max.   :2025-02-05 23:59:32.00                     
## 

2.4 Date formatting

2.5 Time series

3 CLEANING STEPS

3.1 Define threshold values

3.2 Remove outliers

3.3 Sanity check time series - did you do your cleaning job?

3.4 Did you find any funky time periods that need to be removed from the data? If so, filter by Date range.

4 EXPLORATORY DATA ANALYSIS

We can answer a number of questions with air quality data. Some examples include: What is the air quality like now? Where is the air quality bad (now/typically)? When was AQ bad? What time of day should I (not) go outside? Where is my pollution coming from? How many bad pollution days were there this year? What fraction of the time was AQ good, bad, or in the middle? We’ll use the R package openair to explore answers to these questions with data.

4.0.1 Calendar Plots: When was air quality bad? How many bad days were there in the last year?

4.1 When PM1 was bad, what else was bad?

4.1.1 Explore some scatterplots

4.2 Diurnal Profiles: When is air quality (typically) bad? When is it typically (not) safe to go outside?

4.2.1 TrendLevel - when was air quality typically bad?

4.3 Directional analysis of pollutants: Where is pollution bad? And where is pollution coming from?

4.3.1 Create polar plots (and other things in that family)

## # A tibble: 4 × 5
##   cluster mean_pm10      n n_percent pm10_percent
##   <chr>       <dbl>  <int>     <dbl>        <dbl>
## 1 C1           49.3    240       0            0.1
## 2 C2           36.1   7837       1.5          1.3
## 3 C3           48.3 312447      61.5         71.2
## 4 C4           31.0 187434      36.9         27.4

## # A tibble: 4 × 5
##   cluster mean_pm10      n n_percent pm10_percent
##   <chr>       <dbl>  <int>     <dbl>        <dbl>
## 1 C1           49.3    240       0            0.1
## 2 C2           36.1   7837       1.5          1.3
## 3 C3           48.3 312447      61.5         71.2
## 4 C4           31.0 187434      36.9         27.4

4.3.2 Create Polar map plots

5 WQ

6 More General EDA

6.1 scatter plot - WIP, IDK yet

6.2 time series of all pm levels

  • Wasn’t able to see the detail from PM1 and pm2.5
  • EPA Limit(pm2.5: 35, pm10: 150) (Blue), Average level without 12/16-12/18 Data (Red)
  • “This standard should not be exceeded more than once per year on average over three years”

6.3 Histogram of Daily Average PM10 Levels

  • EPA uses a 24-hour standard of 150 µg/m³.
  • This standard should not be exceeded more than once per year on average over three years.
  • shown that it has exceeded twice (2024-07-31 - 145.164072) within a 7 month period
  • MOD-PM-01395 38.37543, MOD-PM-01396 45.22899, Overall Average PM10: 41.80221
## # A tibble: 2 × 2
##   sn           overall_avg
##   <chr>              <dbl>
## 1 MOD-PM-01395        38.4
## 2 MOD-PM-01396        45.2
## Overall Average PM10: 41.80221
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Histogram of Daily Average PM2.5 Levels

  • primary limit: 9.0 μg/m3, secondary limit: 15.0 μg/m (annual mean, averaged over 3 years)
  • The average of 7 months period is 8.807473
  • MOD-PM-01395 9.537952, MOD-PM-01396 8.076994, Overall Average PM2.5: 8.807473
## # A tibble: 2 × 2
##   sn           overall_avg
##   <chr>              <dbl>
## 1 MOD-PM-01395        9.54
## 2 MOD-PM-01396        8.08
## Overall Average PM2.5: 8.807473

7 12/16-12/18 Data

7.1 Time series of 12/16-12/18 with epa limits

  • when looking at the calendar plot of pm2.5, pm10 12/16-12/18 stand out
  • missing info on sen 1395from dec 16 - dec 17 9am
  • 1396 is more towards the Southeast and generally has higher numbers
  • 1396 blue and red line are overlapped
## Sensor MOD-PM-01395 rows: 914

## Sensor MOD-PM-01396 rows: 2875

7.2 Polar Plots of 12/16-12/18

  • Both sensors are showing things from bottom right
  • just a lot more on 1396
## Sensor MOD-PM-01395 rows: 914

## # A tibble: 4 × 5
##   cluster mean_pm10     n n_percent pm10_percent
##   <chr>       <dbl> <int>     <dbl>        <dbl>
## 1 C1           71.4   220      24.1         19  
## 2 C2           74.2   425      46.5         38.2
## 3 C3           86.5    53       5.8          5.5
## 4 C4          143.    216      23.6         37.3
## Sensor MOD-PM-01396 rows: 2875

## # A tibble: 4 × 5
##   cluster mean_pm10     n n_percent pm10_percent
##   <chr>       <dbl> <int>     <dbl>        <dbl>
## 1 C1          125.    508      17.7         14.6
## 2 C2           62.7   238       8.3          3.4
## 3 C3          138.   1148      39.9         36.4
## 4 C4          202.    981      34.1         45.6

7.3 Annulus polar plot map 2345623464356

## Warning: There were 2 warnings in `dplyr::mutate()`.
## The first warning was:
## ℹ In argument: `data = purrr::map(data, prepare.grid)`.
## Caused by warning:
## ! There was 1 warning in `mutate()`.
## ℹ In argument: `data = purrr::map(data, prepare.grid)`.
## ℹ In group 1: `default = 15 December 2024 to 17 December 2024`.
## Caused by warning in `smooth.construct.cc.smooth.spec()`:
## ! basis dimension, k, increased to minimum possible
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## Warning: There were 2 warnings in `dplyr::mutate()`.
## The first warning was:
## ℹ In argument: `data = purrr::map(data, prepare.grid)`.
## Caused by warning:
## ! There was 1 warning in `mutate()`.
## ℹ In argument: `data = purrr::map(data, prepare.grid)`.
## ℹ In group 1: `default = 15 December 2024 to 17 December 2024`.
## Caused by warning in `smooth.construct.cc.smooth.spec()`:
## ! basis dimension, k, increased to minimum possible
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

8 Missing entries

  • When looking at time series, there are a substantial amount of gaps shown through straight lines

8.1 Histogram of Entries by hour

  • Shows how the entries jump up ~8th hour and gradually decreases at night
  • This is shown to be the result of solar panels not having battery
  • QuantAQ potentially has a program to upgrade the batteries for free
## # A tibble: 24 × 2
##     hour entries
##    <int>   <int>
##  1    11   23134
##  2    12   23113
##  3    13   23029
##  4    10   22959
##  5    14   22926
##  6    15   22809
##  7     9   22734
##  8    16   22483
##  9    19   22296
## 10    17   22281
## # ℹ 14 more rows

9 Where to look now?

9.1 Trendline without (12/16-12/18)

  • For the month of Dec., it shows hours of 3-5 am lowering but still has a noticeable level of pm10
  • Now it’s emphasizes 0-2 am during the month of february which is something we will try to look into

### Feb

  • there are only 6 days so the data is not fully represent the month of Febuary
  • there is a large spike on the 5th skewing the data

## Trendline without (12/16-12/18) and Feb.

  • even after removing all the outlier data it still shows that dec has a high mean pm10 from 3-5

9.2 Polar Plots of PM10 by season

  • Shows how it tends to be South East of both sensors

9.3 Histograpms of 10 min + 60 min moving averages

## # A tibble: 220 × 2
##    date       daily_peak
##    <date>          <dbl>
##  1 2024-07-02      217. 
##  2 2024-07-03      183. 
##  3 2024-07-04      452. 
##  4 2024-07-05      573. 
##  5 2024-07-06      217. 
##  6 2024-07-07       69.5
##  7 2024-07-08      122. 
##  8 2024-07-09       62.7
##  9 2024-07-10       54.3
## 10 2024-07-11      206. 
## # ℹ 210 more rows

## # A tibble: 220 × 2
##    date       daily_peak
##    <date>          <dbl>
##  1 2024-07-02      133. 
##  2 2024-07-03      132. 
##  3 2024-07-04      171. 
##  4 2024-07-05      303. 
##  5 2024-07-06      161. 
##  6 2024-07-07       53.8
##  7 2024-07-08       61.5
##  8 2024-07-09       53.7
##  9 2024-07-10       49.6
## 10 2024-07-11       80.9
## # ℹ 210 more rows

9.3.1 time series of the top 10 days from moving averages

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')

10 hourly polar plots for pm 10

  • K = 100 for seasonal plot
  • K = 50 for hourly plot
  • note that standard is k = 100
  • when k = 100, sumemr data is gone, hours 00-07 in autumn are gone, and only 1396 13th hour in winter data

11 Diurnal Plots of pm 10

12 Diurnal Plots of pm 2.5

13

14

15